<!DOCTYPE html>
<html lang="en">

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Simple GCC projects</title>
<link rel="stylesheet" type="text/css" href="https://gcc.gnu.org/gcc.css" />
</head>

<body>
<h1>Simple GCC projects</h1>

<p>This page lists projects which are feasible for people who aren't
intimately familiar with GCC's internals.  Many of them are things
which would be extremely helpful if they got done, but the core team
never seems to get around to them.  They're all busy wrestling with
the problems that <em>do</em> require deep familiarity with the
internals.  We hope this will make it easier for more people to assist
the GCC project, by giving new developers places to jump in.</p>

<p>Most of these projects require a reasonable amount of experience
with C and the Unix programming environment.  Do not despair if any
individual task seems daunting; there's probably an easier one.  If
you have <em>no</em> programming skills, we can still use your <a
href="documentation.html">help with documentation</a> or our bug tracker.</p>

<p>We assume that you already know how to get the
latest sources, configure and build the compiler, and run the test
suite.  You should also familiarize yourself with the
<a href="../contribute.html">requirements for contributions</a> to GCC.</p>

<p>Many of these projects will require at least a reading knowledge of
GCC's intermediate language,
<a href="https://gcc.gnu.org/onlinedocs/gccint/RTL.html">RTL</a>.
It may help to understand the higher-level <code>tree</code> structure as
well.  Unfortunately, for this we only have an <a
href="https://gcc.gnu.org/onlinedocs/gccint/GENERIC.html">incomplete, C/C++ specific manual</a>.</p>

<p>Remember to <a href="../contributewhy.html">keep other developers
informed</a> of any substantial projects you intend to work on.</p>

<h2>Bug patrol</h2>

<p>These projects all have to do with bugs in the compiler, and our
testsuite which is supposed to make sure no bugs come back.</p>

<ul>
<li>Analyze failing test cases.

<p>Pick a test case which fails (expected or unexpected) with the
present compiler, and try to figure out what's going wrong.  For
internal compiler errors ("ICEs") often you can find the problem by
running <code>cc1</code> under the debugger.  Set a breakpoint on
<code>fancy_abort</code> (this happens automatically if you work in
your build directory).  When gdb stops, go up the stack to the
function that called <code>fancy_abort</code>.  It will have just
performed some sort of consistency check, which failed.  Normally this
check will be visible right there.  (If the ICE prints "Tree check:"
or "RTL check:" before the usual message, the check is hiding in the
accessor macros.)  Examine the data structure that was checked.  Walk
back in time and figure out when it got messed up.</p>

<p>There are a large number of routines which you can call from the
debugger, to display internal data in readable form.  Their names all
begin with "<samp>debug_</samp>".  The most useful ones are
<code>debug_tree</code> for printing tree structures,
<code>debug_rtx</code> for printing chunks of RTL, and
<code>debug_bb</code> and <code>debug_bb_n</code> for printing out
basic block information.</p>

<p>If the problem is that the compiler generates incorrect code, the
place to start is the RTL debugging dumps.  Run the compiler with the
<samp>-da</samp> switch.  This will generate twenty or so debug dumps,
one after each pass.  Read through them in order (they are numbered).
The code should start off correct, but then become erroneous.  When
you find the mistake, enter the debugger, set a breakpoint on the pass
that made the mistake, and watch what it does.  You can find out the
name of the entry point for each pass by reading through
<code>rest_of_compilation</code> in <code>toplev.c</code>.</p>
</li>

<li>Get rid of <code>testsuite/gcc.misc-tests</code> and
<code>testsuite/g++.dg/special.</code>

<p>These are a handful of tests each that aren't handled by the normal
test sequence.  We'd like to get rid of the special case framework.
We <em>think</em> that they're only done this way for historical
reasons, but we aren't sure.  Most of the work would be figuring out
what's going on in those directories.  You'll need some understanding
of Expect, TCL, and the DejaGNU test harness.</p>
</li>

<li>Cross-reference all the tests and find all the duplicates.

<p>It's likely that the same test has been added more than once, over
the years.  You'd need to figure out a sensible definition of "the
same test" that can be checked mechanically, then write a program that
does that check, and run it against the entire testsuite.</p>
</li>

<li>Perform additional GCC testing.

<p>See <a href="../testing/">GCC Testing Efforts</a>
for ideas and information about what's already being done.</p>
</li>

</ul>

<h2>General code cleanliness</h2>

<p>These are projects which will generally make it easier to work with
the source tree.</p>

<ul>
<li>Warnings patrol.

<p>Simple: build the tree, run the <code>warn_summary</code> script
(from the <code>contrib</code> directory) against your build log, then
go through the list and squelch the warnings.  In most cases this is
easy.  However, if you have any doubt about what some piece of code
does, ask.  Sometimes the proper fix is not obvious.  For example,
there are a lot of warnings about "comparison between signed and
unsigned" in a GCC build, but unless you really know what you're
doing, you should leave them alone.</p>

<p>Also, some warnings are spurious.  If you can patch the part of the
compiler that issues spurious warnings, so it doesn't anymore (but
still does generate the warning where it's appropriate), we're happy
to take those patches too.</p>
</li>

<li>Break up enormous source files.

<p>Not terribly hard.  Watch out for file-scope globals.  Suggested
targets:</p>

<pre>
	1.4M	cp/parser.c
	1000K	dwarf2out.c
	932K	cp/pt.c
	684K	c/c-parser.c
	576K	cp/decl.c
	516K	cp/module.cc
	512K	fold-const.c
	500K	gimplify.c
	492K	c/c-typeck.c
	484K	combine.c
	460K	omp-low.c
	432K	tree.c
	408K	expr.c
	404K	cp/call.c
</pre>

<p>There are several other files in this size range, which are left
out because touching them at all is unwise (reload, the Fortran front
end).  You can try, but breaking them up likely may prove rather
difficult, so beware of the level of challenge before attempting.</p>

<p>Note that the list of large files in this section is generated with the
following shell command, run from the gcc subdirectory:</p>

<pre>
	du -sh *.{c,h,cc} */*.{c,h,cc} | grep -v fortran | sort -hr | head -n 14
</pre>
</li>

<li>Remove as much code from parser actions as possible.

<p>This goes more or less with the above.  Good existing code:</p>

<pre>
expr_no_commas:
        expr_no_commas '+' expr_no_commas
                { $$ = parser_build_binary_op ($2, $1, $3); }
</pre>

<p>Bad existing code:</p>

<pre>
cast_expr:
        '(' typename ')' cast_expr  %prec UNARY
                { tree type;
                  int SAVED_warn_strict_prototypes = warn_strict_prototypes;
                  /* This avoids warnings about unprototyped casts on
                     integers.  E.g. "#define SIG_DFL (void(*)())0".  */
                  if (TREE_CODE ($4) == INTEGER_CST)
                    warn_strict_prototypes = 0;
                  type = groktypename ($2);
                  warn_strict_prototypes = SAVED_warn_strict_prototypes;
                  $$ = build_c_cast (type, $4); }
</pre>

<p>All the logic here should be moved into a separate function in
c-typeck.c, named something like parser_build_c_cast.  The point of
doing this is, the less code in Yacc input files, the easier it is to
rearrange the grammar and/or replace it entirely.  Also it makes it
less likely that someone will muck with action code and then forget to
rebuild the generated parser and check it in.</p>

<p>We also want to minimize the number of helper functions embedded in
the grammar file.</p>
</li>

<li>Break up enormous functions.

<p>This is in the same vein as the above, but significantly harder,
because you must take care not to change any semantics.  The general
idea is to extract independent chunks of code to their own functions.
Any inner block that has a half dozen local variable declarations at
its head is a good candidate.  However, watch out for places where
those local variables communicate information between iterations of
the outer loop!</p>

<p>With even greater caution, you may be able to find places where
entire blocks of code are duplicated between large functions (probably
with slight differences) and factor them out.</p>
</li>

<li>Break up enormous conditionals.

<p>Harder still, because it's unlikely that you can tell what the
conditional tests, and even less likely that you can tell if that's
what it's supposed to test.  It is definitely worth the effort if you
can hack it, though.  An example of the sort of thing we want
changed:</p>

<pre>
 if (mode1 == VOIDmode
     || GET_CODE (op0) == REG || GET_CODE (op0) == SUBREG
     || (modifier != EXPAND_CONST_ADDRESS
         &amp;&amp; modifier != EXPAND_INITIALIZER
         &amp;&amp; ((mode1 != BLKmode &amp;&amp; ! direct_load[(int) mode1]
              &amp;&amp; GET_MODE_CLASS (mode) != MODE_COMPLEX_INT
              &amp;&amp; GET_MODE_CLASS (mode) != MODE_COMPLEX_FLOAT)
             /* If the field isn't aligned enough to fetch as a memref,
                fetch it as a bit field.  */
             || (mode1 != BLKmode  
                 &amp;&amp; SLOW_UNALIGNED_ACCESS (mode1, alignment)
                 &amp;&amp; ((TYPE_ALIGN (TREE_TYPE (tem))
                      &lt; GET_MODE_ALIGNMENT (mode))
                     || (bitpos % GET_MODE_ALIGNMENT (mode) != 0)))
             /* If the type and the field are a constant size and the
                size of the type isn't the same size as the bitfield,
                we must use bitfield operations.  */
             || ((bitsize &gt;= 0
                  &amp;&amp; (TREE_CODE (TYPE_SIZE (TREE_TYPE (exp)))
                      == INTEGER_CST)
                  &amp;&amp; 0 != compare_tree_int (TYPE_SIZE (TREE_TYPE (exp)),
                                            bitsize)))))
     || (modifier != EXPAND_CONST_ADDRESS
         &amp;&amp; modifier != EXPAND_INITIALIZER
         &amp;&amp; mode == BLKmode
         &amp;&amp; SLOW_UNALIGNED_ACCESS (mode, alignment)
         &amp;&amp; (TYPE_ALIGN (type) &gt; alignment
             || bitpos % TYPE_ALIGN (type) != 0)))
   {
</pre>
</li>

<li>Verify all the object-&gt;header dependencies in the Makefiles.

<p>Mega bonus points for working out a way to do automatic dependency
generation <em>without</em> relying on features of GCC or GNU
make.  And we don't want a <samp>make dep</samp> pass if it can
possibly be avoided.</p>
</li>

<li>Figure out some way to get dependencies of source files on
<code>tm.h</code> and <code>xm-<var>host</var>.h</code> headers.

<p>Presently these dependencies are omitted entirely.  Almost
everything has to be rebuilt if you change <code>tm.h</code> or
<code>xm-<var>host</var>.h</code>, and right now the only way to do
that is rebuild from scratch.</p>
</li>

<li>Delete garbage.

<p><code>#if 0</code> blocks that have been there for years, unused
functions, unused entire files, dead configurations, dead Makefile
logic, dead RTL and tree forms, and on and on and on.  Depending on
what it is, it may not be obvious if it's garbage or not.  Go for the
easy ones first.</p>
</li>

<li>Revisit issues put off till later.

<p>Find comments of the form /* Look at this again after gcc 2.3 */,
or /* ... after <var>date</var> */ where <var>date</var> was sometime in
the last millennium, and investigate.  Analyze test cases marked XFAIL
and patch them.</p>
</li>

<li>Use predicates for RTL objects

<p>GCC has simple predicates to see if a given <code>rtx</code> is of some
specific class.  These predicates simply look at the <code>rtx_code</code>
of the given RTL object and return nonzero if the predicate is true.
For example, if an <code>rtx</code> represents a register, then
<code>REG_P (rtx)</code> is nonzero.</p>

<p>Unfortunately, lots of code in the middle end and in the back ends does
not use these predicates and instead compare the <code>rtx_code</code>
in place: <code>(GET_CODE (rtx) == REG)</code>.  Find all the places where
such comparisons can be replaced with a predicate.  Also, for many common
comparisons there is no predicate yet.  See which ones are worth having
a predicate for, and add them.  You can find a number of
<a href="https://gcc.gnu.org/ml/gcc/2004-05/msg00447.html">suggestions</a>
in the mailing list archives.</p>
</li>

<li>Disentangle the current web of header-header interdependencies.

<p>This is a major undertaking, and you should be able to deal with
all kinds of lurking monsters.</p>

<p>At present, most of GCC's internal headers use whatever they need
without any consideration for whether or not it has been declared yet.
This forces the users of those headers to know what each one needs,
and use it explicitly.  Worse, there is no simple or even documented
relation between the source file where something is defined, and the
header where it is declared.</p>

<p>There are some horrible kludges lurking here and there.  In places
we avoid prototyping things if we haven't seen necessary typedefs, for
example.  Some things are declared in several different headers, each
used by a disjoint subset of the source.  Odds are that some of those
duplicates don't match the definition.</p>

<p>Your goals for this project:</p>

<ol>
  <li><p>It should be possible to include any header without having to
  worry about what its dependencies are; i.e. all headers should
  explicitly pull in their dependencies.  (like the standard library
  headers).</p>

  <p>As an exception, headers should not explicitly reference
  <code>config.h</code>, <code>system.h</code>, or
  <code>ansidecl.h</code>.  Nor should they reference any headers
  explicitly included by <code>system.h</code>, such as
  <code>stdio.h</code>.  They <em>should</em> reference other headers
  from libiberty or libc, where necessary.</p></li>

  <li><p>Each function, global declaration, or type definition should
  appear in exactly one header.  Forward declarations of structs and
  unions do not count.</p></li>

  <li><p>That one header should have an obvious relationship to the
  nature of the thing being declared.  It should never be necessary to
  grep the entire source tree to figure out which header you need.</p></li>

  <li><p>Each header should have the minimum possible number of
  references to other headers.  If a header describes ten routines,
  two of which require <code>rtl.h</code>, and the other eight are
  useful by themselves, then the header should be split so that they
  can be used without dragging in RTL.  Possibly the corresponding
  source file should be split to match.</p></li>
</ol>
</li>

<li>Disambiguate flags.

<p>Find all the places where one flag bit is used with several
different meanings depending what sort of tree or RTL it is in, and
give each different meaning a different accessor macro.  Augment the
tree/RTL checking macros so they verify that the accessors match the
data.</p>
</li>

<li>Rename routines used by the debugging information generators, so
they do not occupy the same namespace as routines intended to be used
when debugging the compiler.

<p>Currently, if you ask gdb for a list of all the functions whose
names begin with "<samp>debug_</samp>", you get a mixed bag of
data structure dumpers and debug-info generators:</p>

<pre>
(gdb) call debug_&lt;TAB&gt;&lt;TAB&gt;
debug_args                      debug_line_section_label
debug_bb                        debug_loop
debug_bb_n                      debug_loops
debug_binfo                     debug_name
debug_bitmap                    debug_no_type_hash
debug_bitmap_file               debug_print_page_list
debug_biv                       debug_ready_list
debug_call_placeholder_verbose  debug_real
debug_candidate                 debug_regions
debug_candidates                debug_regset
debug_define                    debug_reload
debug_dependencies              debug_reload_to_stream
debug_dwarf                     debug_rli
debug_dwarf_die                 debug_rtx
debug_end_source_file           debug_rtx_count
debug_flow_info                 debug_rtx_find
debug_giv                       debug_rtx_list
debug_ignore_block              debug_rtx_range
debug_info_level                debug_sbitmap
debug_info_section_label        debug_start_source_file
debug_info_type                 debug_stderr
debug_insn                      debug_tree
debug_iv_class                  debug_type_names.2
debug_ivs                       debug_undef
</pre>

<p>It is not at all obvious which is which.  Rename functions so that
everything which is useful from the debugger has a name starting with
<samp>debug_</samp>, and nothing else does.</p>
</li>

<li>Change code formatting to follow the <a
href="../codingconventions.html">GCC coding conventions</a>
consistently.</li>

<li>Change code to follow the coding conventions in other ways.  For
example, change arbitrary hardcoded parameters to use the
<code>--param</code> mechanism.</li>

</ul>

<h2>Port cleanliness</h2>

<p>This involves mostly bringing back ends up to date with the current
state of the art in the machine-independent code.  Many ports date
back to the 1980s and have not been actively maintained since then.
There is also work to be done in cleaning up the places where the MI
code uses machine-specific macros.</p>

<p>In addition to understanding RTL, you need to read the <a
href="https://gcc.gnu.org/onlinedocs/gccint/Machine-Desc.html">machine description</a> and <a
href="https://gcc.gnu.org/onlinedocs/gccint/Target-Macros.html">target macros</a> sections of the GCC
manual.</p>

<ul>
<li>Migrate default definitions of <code>tm.h</code> macros out of
random source files into <code>defaults.h</code>.

<p>It would be a lot more work, but we might consider including
<code>defaults.h</code> <em>first</em>, have it define everything
unconditionally, then have <code>tm.h</code>'s <code>#undef</code>
whatever they need to override.</p>
</li>

<li>Remove commented-out definitions of macros and descriptions of
macros which ports do not use from all <code>tm.h</code> files.

<p>This is so that grepping for all the uses of a particular macro
will get no false positives.</p>
</li>

<li>Remove comments above macro definitions in the <code>tm.h</code>
files that only describe the meaning of the macro and say nothing specific
to that machine.

<p>These comments have largely been copied from one <code>tm.h</code>
file to another, and many may be out of date by now.  Target macros
should be documented in <code>tm.texi</code> only, not in the
individual target headers.  However, where there are comments
describing the reason for a particular target's choice of definition,
or saying something about that choice beyond repeating what the
definition means, those comments should be preserved.</p>

<p>When removing comments describing target macros (whether on
definitions of those macros, or on commented-out definitions), make
sure that the macro is documented in <code>tm.texi</code> and the
comments don't say anything more that ought to be in the manual.</p>
</li>

<li>Convert huge macros in each <code>tm.h</code> to functions in the
corresponding <code>tm.c</code>.

<p>This can be tricky when a huge macro is defined not by the general
<code>tm.h</code> for a processor, but the specific one for some
particular target triple.  The best known approach here is to set some
flag macros in the target-specific <code>tm.h</code>, then
<code>#ifdef</code> up the function in <code>tm.c</code>.  Better
ideas would be appreciated.</p>
</li>

<li>Adjust the definitions of porting macros to make the above easier.

<p>There are some macros that need a lengthy definition, and are
required to perform a <code>goto</code> to a label outside the macro
under certain conditions.  This makes moving all the logic into a
separate function difficult.  These macros should be replaced by
new macros which return a flag instead.  The goto then happens in the
code that uses the macro.</p>
</li>

<li>Convert configurations to the new style where tm.h chunks do not
include each other incestuously.

<p>Instead, <code>config.gcc</code> lists each chunk explicitly, in
order from least to most specific.</p>
<!-- XXX Can someone describe this better? -->
</li>

<li>Clean up <code>#ifdef</code> messes in <code>tm.h</code> chunks.

<p>The preferred style is: Chunks are used in order from least to most
specific.  Each chunk mentions only the macros it has specific
definitions for.  Each chunk <code>#undef</code>s any previous
definition first.  (Contrary to popular belief, it is always safe to
<code>#undef</code> a macro, whether or not it has already been
defined.)</p>
</li>

<li>Make porting macros testable at runtime.

<p>We'd like to be able to change more of the compiler's behavior at
runtime using <samp>-m</samp> switches.  To do this, regions of code
that presently read</p>

<pre>
     #ifdef MACRO
       ... code ...
     #endif
</pre>

<p>must become instead</p>

<pre>
     #ifdef MACRO
       if (MACRO)  
         ... code ...
     #endif
</pre>

<p>If possible (this depends on which macro it is) a third form is
even better: in <code>defaults.h</code></p>

<pre>
	#ifndef MACRO
	#define MACRO 0
	#endif
</pre>

<p>and then the users become simply</p>

<pre>
	  if (MACRO)
	    ... code ...
</pre>

<p>This style subjects more code to compile-time checking, so bit-rot
in obscure target-specific features is more likely to be noticed.</p>
</li>

<li>Convert text peepholes to RTL peepholes.

<p>GCC has two forms of peephole optimization: the old style that
edited the text assembly output as it was being generated, and the new
style that transforms RTL to RTL.  The new form is conceptually
cleaner and requires less gunk in the implementation.</p>

<p>The targets with text peepholes are:</p>
<pre>
  arm avr c4x cris fr30 ip2k m32r m68hc11 m68k
  mcore mips mn10300 ns32k pa rs6000 sh.
</pre>
</li>

<li>Convert text prologue/epilogue generation to use expanders
instead.

<p>As with peepholes, there is an old style and a new.  The old style
uses the <code>TARGET_ASM_FUNCTION_PROLOGUE</code> and
<code>TARGET_ASM_FUNCTION_EPILOGUE</code> macros, which insert text
directly into the output.  The new style uses the
<code>prologue</code> and <code>epilogue</code> named expanders to
generate RTL.</p>

<p>The situation here is a bit weird.  Targets which only have
<code>TARGET_ASM_FUNCTION_PROLOGUE/EPILOGUE</code> in
<code>tm.h</code> are:</p>
<pre>
  arc avr m68k ns32k pdp11 vax
</pre>
<p>Targets which only have <code>prologue</code> and <code>epilogue</code>
named expanders are:</p>
<pre>
  alpha c4x h8300 fr30 m68hc11 mcore mn10300 sh
</pre>
<p>Targets which have <em>both</em> are:</p>
<pre>
  arm i386 ia64 m32r mips pa rs6000 sparc
</pre>
<p>I'd suggest starting with the targets that have both.</p>
</li>

<li>Find magic numbers in <samp>.md</samp> files and make them use
<code>define_constants</code> instead.

<p><code>define_constants</code> is brand new, so few targets know
about it.  It is most useful for things like fixed register numbers.
Constants defined with it are also visible to C code via the
<code>insn-codes.h</code> header.</p>
</li>

<li>Correct all warnings and errors emitted by <code>gen*.c</code> in
the course of a bootstrap.

<p>This may require pretty detailed knowledge of the way machine
definition files are supposed to be written, unfortunately.  For the
more exotic targets, you can usually start by building a
cross-compiler from whatever you have to &lt;processor&gt;-unknown-none.  It
doesn't have to <em>work</em>, just build far enough to run the MD
generators.</p>
</li>

<li>Store attribute lists in canonical form.

<p>Consider making the adjustments described in the comment above
the definition of <code>is_attribute_p</code>: caller is required to
state the unqualified form of the name, not the underscored form; all
internal attribute lists remember the unqualified form, no matter what
was used in the code.</p>
</li>

<li>Convert md files that use <code>(cc0)</code> so they don't anymore.

<p>This is hard, but would be a great improvement to the compiler if
it were done for all existing targets.  The basic idea is that</p>

<pre>
(insn ### {cmpsi} (set (cc0) (compare (reg:SI A) (reg:SI B))))
(insn ### {bgt} (set (pc) (if_then_else
                        (gt (cc0) (const_int 0))
                        (label_ref 23)
                        (pc)))
</pre>

<p>becomes</p>

<pre>
(insn ### {bsicc} (set (pc) (if_then_else
                        (gt:SI (reg:SI A) (reg:SI B))
                        (label_ref 23)
                        (bc)))
</pre>

<p>Unfortunately, the technique is very poorly documented and may need
extending to other conditional operations (setcc, movcc) as well.</p>
</li>

<li>Find hooks in the machine-independent code which aren't used by
any target anymore, and remove them.

<p>Right now there probably aren't too many of these, but there will
be once some of the above projects get rolling.</p>
</li>
</ul>

<h2>Configuration and Makefiles</h2>

<p>This largely consists of the same sort of thing as the above, but
for per-host configuration instead of per-target.  You will need to
understand autoconf, or Make, to do these projects.</p>

<ul>

<li>Find places that are still using obsolete system-category macros
(<code>USG</code>, <code>POSIX</code>, etc) and autoconfiscate them.

<p><code>tsystem.h</code> uses <code>USG</code> and a couple others to
know if it can safely include <code>string.h</code> and
<code>time.h</code>.  As both of them are required by C99, we should
just synthesize them and include them unconditionally.  (fixproto
already does this for <code>stdlib.h</code> and several others.)</p>

<p>The real mess is in the debug info generators.</p>
</li>

<li>Run fixincludes on all targets.

<p>We want all targets' headers to be handled the same way.  The
existing practice causes hard-to-find bugs which only manifest on
platforms that are unpopular, so they never get fixed.</p>
</li>

<li>Get as much as possible out of the <code>t-<var>target</var></code>
Makefile fragments.

<p>It's unlikely that these can be eliminated entirely, since we have
no way of testing the features of a <var>target</var> when we are still
constructing its cross-compiler.  However, there is a lot of obsolete
cruft in them.  Start by expunging all remaining traces of
libgcc1.</p>

<p>There are also things in there that should be handled by
fixincludes and fixproto, such as INSTALL_ASSERT_H and the corresponding
Makefile magic.</p>

<p>Note that targets do not need to supply a
<code>t-<var>target</var></code> fragment, if it has nothing to do.
Empty fragments can be deleted and all references to them nuked from
<code>config.gcc</code>.</p>
</li>

<li>Get as much out of the <code>x-<var>host</var></code> fragments and
<code>xm-<var>host</var>.h</code> headers into autoconf tests,
<code>system.h</code>, etc., as possible.

<p>I am fairly sure that all of these files can be eliminated
completely, and their infrastructure done away with.  Information in
them is in six categories:</p>

<ol>
  <li><p>Historical dead wood: definitions of macros or Make variables
      that are no longer used for anything, definitions that are
      invariably overridden by something else, etc.  Some files contain
      only comments!</p></li>

  <li><p>Things that belong in <code>system.h</code> or
      <code>ansidecl.h</code>, such as definitions of
      <code>TRUE</code>.</p></li>

  <li><p>Things that belong in a <code>tm.h</code> or
      <code>t-<var>target</var></code> file.  E.g. <code>x-linux</code>
      has no business saying not to run fixproto,
      <code>xm-interix.h</code> has no business specifying how to run
      global constructors.</p></li>

  <li><p>System category assertions, which should be replaced by feature
      checks, but we have to do work in machine-independent code
      first.</p></li>

  <li><p>Feature assertions, which should be replaced by autoconf
      probes.  Some of these are there because at the time they were
      written, autoconf couldn't detect whatever it was.  Note that
      all the autoconf tests have to work when the compiler is itself
      being cross-compiled (with exceptions when we can do graceful
      degradation, e.g. the mmap tests).  Others are there because the
      autoconf test for the feature in question breaks in the presence
      of a buggy host compiler and/or library.</p>

      <p>In principle there is no reason why all of the feature
      assertions can't be replaced by autoconf probes, with sufficient
      cleverness.  The hardest ones will probably be
      <code>{SUCCESS,FATAL}_EXIT_CODE</code>.  Note that autoconf 2.50
      has sufficient tricks up its sleeve to do
      <code>HOST_BITS_PER_*</code> even when cross compiling.</p></li>

  <li><p>Information on how to deal with file systems which are not
      Unix-y.  For instance, definitions of
      <code>PATH_SEPARATOR(_2)</code> and/or
      <code>HAVE_DOS_BASED_FILE_SYSTEM</code>, a complete override of
      <code>INCLUDE_DEFAULTS</code> for VMS, etc.</p>

      <p>This stuff is harder to deal with than the others.  For DOS,
      we could restructure the machine-independent code so there was
      just one switch, namely <code>HAVE_DOS_BASED_FILE_SYSTEM</code>,
      and autoconf could set that based on the host machine name.  We
      probably want to go in that direction anyway.  See "Library
      infrastructure," below.</p>

      <p>I don't know what to do about VMS.  It is utterly different,
      although I'm told the system libraries mask a lot of the
      differences these days.  I would be very surprised if GCC
      actually builds on <samp>{alpha,vax}-dec-*vms*</samp> right now.</p></li>
</ol>
</li>

</ul>

<h2>Library infrastructure</h2>

<p>These tasks are about improving the utility routine library used by
GCC.  If you like data structures, these may be for you.</p>

<ul>
<li>Find private implementations of general data structures, and make them
use library routines instead.

<p>For example, there are hand-rolled hash tables all over the place.
Most of them should be using libiberty's <code>hashtab.c</code>
instead.  However, there are at least three places where we
deliberately use custom code for performance reasons, so be careful.</p>
</li>

<li>Write nifty pseudo-template versions of existing general data
structures to avoid abstraction penalties.

<p>This is for someone who likes working with preprocessor macros, and
can use them cleverly but still readably.  Start with
<code>hashtab.c</code> and <code>splay-tree.c</code> (both in
libiberty).</p>

<p>Once this is done, we can stop avoiding the general code in
performance-critical areas.</p>
</li>

<li>Generalize gcc-specific data structure modules and move them to
libiberty.

<p>For example: <code>[s]bitmap.c</code>, <code>lists.c</code>,
<code>stringpool.c</code>.</p>
</li>

<li>Find private workarounds for host bugs and move them to libiberty.

<p>These tend to be hiding in odd places like the config directory, or
else woven through important areas of code, e.g. the garbage
collector.</p>
</li>

<li>Extract all the code that manipulates pathnames, make sure it can
handle DOS as well as Unix style paths, and move it to libiberty.

<p><code>prefix.c</code>, <code>simplify_pathname</code> in
<code>cppfiles.c</code>, and so on.  Also, make all the DOS handling
conditional only on <code>HAVE_DOS_BASED_FILE_SYSTEM</code>, and get
rid of the <code>PATH_SEPARATOR</code> macros.</p>
</li>

<li>Implement a macro preprocessor for <samp>.md</samp> files.

<p>It should act like the macro processor for <a
href="https://sourceware.org/cgen/">CGEN</a>, which also uses
RTL-ish definition files.  You can start with conditional blocks and
include files.  Remember that we already have define_constants.</p>
</li>
</ul>

<h2>User interface</h2>

<ul>

<li>Fix the places where <samp>-std=c89</samp> is not the same thing
as <samp>-ansi</samp>.</li>

<li>More broadly, make more and more flags consistent across all the
front ends.</li>

<li>Implement a <samp>-Wstd</samp> switch that turns on all warning
flags useful in well-written standard-compliant code (for C,
<samp>-Wstrict-prototypes -Wmissing-prototypes
-Wwrite-strings</samp>).  (Should this imply <samp>-Wall</samp>?
<samp>-W</samp>?)</li>

<li>Implement fine-grained warning control, e.g. disabling a specific
warning by name.  A <a
href="https://gcc.gnu.org/ml/gcc/2000-06/msg00639.html">message to the
gcc list</a> discusses one possible design (based on the gettext
principle of matching against message text rather than assigning other
unique identifiers to each message).</li>

<li>Teach collect2 to recognize when an object module requires a
specific runtime support library and link it in automatically.

<p>That is, if the first linker invocation spits out undefined
symbols, see if they are from libstdc++, libf2c, etc. and throw in the
appropriate library on the second pass.  This would pretty much
eliminate the need for language specific drivers.</p>

<p>It would be neat if it would recognize when libm was necessary,
too.  (No more "where's <code>sqrt(3)</code>?" bug reports!)</p>
</li>
</ul>

<h2>Optimizer improvements</h2>

<p>These require some knowledge of compiler internals and substantial
programming skills, but not detailed knowledge of GCC internals.
I think.</p>

<ul>
<li>Make <code>insn-recog.c</code> use a byte-coded DFA.

<p>Richard Henderson and I started this back in 1999 but never
finished.  I may still be able to find the code.  It produces an order
of magnitude size reduction in <code>insn-recog.o</code>, which is
huge (432KB on i386).</p>
</li>

<li>Make GCSE (and CSE?) capable of digging inside PARALLELs.

<p>This is needed for GCSE to do any good at all on i386.</p>

<p>Here's some dialogue on the subject, which unfortunately may only
confuse you.</p>

<verbatim> <!-- Work around a MetaHTML 5.091 bug, where div is considered a builtin. -->

<blockquote class="mail">
<div>Michael Meissner:</div>
<div>
Actually I would imagine gcse handles clobbers [inside parallels] just
fine and dandy, since it uses <code>single_set</code> which strips off
the clobbers/uses if there is only one set.  What it doesn't handle is
a parallel that has two sets, which on the x86 is for setting the
condition code register.  This probably applies to more phases than
just gcse (look for <code>single_set</code>).  Another place a
parallel with 2 sets is used is for machines that do both the divide
and modulus in one step.</div>
</blockquote>

<blockquote class="mail">
<div>Richard Henderson:</div>
<div>
Those don't get created until combine.
<p>No, the real problem is that gcse doesn't handle hard registers,
so the clobber of hard register 17 (flags) squelches everything.</p>
</div>
</blockquote>

<blockquote class="mail">
<div>Daniel Berlin:</div>
<div>
The comment above hash_scan_insn claims it doesn't handle clobbers in
parallels, yet the code appears to.
</div>
</blockquote>

</verbatim> <!-- Work around a MetaHTML 5.091 bug, where div is considered a builtin. -->

</li>

<li>Find all the places that simplify RTL and make them use
<code>simplify-rtx.c</code>.

<p>Here is some commentary from there:</p>
<blockquote>
<p>Right now GCC has three (yes, three) major bodies of RTL simplification
code that need to be unified.</p>
<ol>
<li><code>fold_rtx</code> in <code>cse.c</code>.  This code uses
various CSE specific information to aid in RTL simplification.</li>
<li><code>combine_simplify_rtx</code> in <code>combine.c</code>.
Similar to <code>fold_rtx</code>, except that it uses combine specific
information to aid in RTL simplification.</li>
<li>The routines in this file.</li>
</ol>

<p>Long term we want to only have one body of simplification code; to
get to that state I recommend the following steps:</p>
<ol>
<li>Pore over fold_rtx and simplify_rtx and move any simplifications
which are not pass dependent state into these routines.</li>
<li>As code is moved by #1, change <code>fold_rtx</code> and
<code>simplify_rtx</code> to use this routine whenever possible.</li>
<li>Allow for pass dependent state to be provided to these routines
and add simplifications based on the pass dependent state.  Remove
code from <code>cse.c</code> and <code>combine.c</code> that becomes
redundant/dead.</li>
</ol>

<p>It will take time, but ultimately the compiler will be easier to
maintain and improve.  It's totally silly that when we add a
simplification that it needs to be added to four places (three for RTL
simplification and one for tree simplification).</p>
</blockquote>
</li>

<li>Convert <code>reorg.c</code> to use the flow graph.

<p>Then we can throw away <code>resource.c</code>. Long term we want
reorg folded into the scheduler, but that's much harder.</p></li>

<li>Improve <code>dwarf2out.c</code>.

<p>DWARF2 can handle all kinds of heavy optimizations that we'd like
to do, but our generator doesn't know how just yet.  At the very least
it'd be nice if <samp>-gdwarf-2 -fomit-frame-pointer</samp> could give
you a clean backtrace on all targets where DWARF works.  (This is
definitely possible.)</p>

<p>You need to coordinate with the gdb team.  It does no good for gcc
to generate fancy debug info if the debugger doesn't understand
it.</p>
</li>

<li>Implement clusters of branch tables as a method of handling
<code>case statements</code>.

<p>Currently gcc has three different methods for handling case/switch
statements.  If the labels form a dense cluster a branch table is used.
Otherwise if it seems sensible a set of bit test and branch instructions are
used.  Failing that a set of compare and branch instructions are generated.</p>

<p>A useful optimization would be to detect the situation where there is more
than one cluster of labels and use compare and branch instructions to choose
the correct cluster and then a branch table to select the correct label.</p>

<p>This optimization has been tried before, as may be seen by
<a href="https://gcc.gnu.org/ml/gcc-patches/2004-07/msg02479.html">this email
thread</a>.</p>
</li>
</ul>

<h2>C/C++ front end</h2>

<ul>
<li><p>Clean up <code>special_function_p</code> and other handling of
functions with names implying given properties.</p>

<p>All properties <code>special_function_p</code> determines ought to
be specifiable with attributes as well.  Where
<code>special_function_p</code> checks for a function not defined by
ISO C, the attribute ought to be added by fixincludes rather than
presuming anything about its semantics within the compiler.  All this
special handing should be disabled by <samp>-ffreestanding</samp>.
Where the function is defined by ISO C (and possibly where it has a
name reserved by ISO C), it should be declared as a built-in function
with the attribute in <code>builtins.def</code>.</p></li>

<li>Find warnings that only the C front end does that would make sense
with C++ and have the C++ front end support them as well, sharing code
if possible.  And vice versa.</li>

<li>More generally, share more code between the C and C++ front ends.</li>

</ul>

</body>
</html>
